Course/Section: CSC 580 AI 2 Final Project
Assignment Name: Reinforcement Learning Analysis in Highway Environments
Group: 40 | Names: Om Prakash Gunja (2131025), Raju Meesala ( 2119844 )
# ## Code piece to mount my Google Drive
# from google.colab import drive
# drive.mount("/content/drive") # my Google Drive root directory will be mapped here# Change the working directory to your own work directory (where the code file is).
import os
thisdir = '/Users/omprakashgunja/Documents/Classes/Winter 2025/AI 2.0/CSC580_Winter2025/Final Project/Finals/PPO_experiments'
os.chdir(thisdir)
# Ensure the files are there (in the folder)
!pwd/Users/omprakashgunja/Documents/Classes/Winter 2025/AI 2.0/CSC580_Winter2025/Final Project/Finals/PPO_experiments
# Install environment and agent
%pip install highway-env
# NOTE: we use the bleeding edge version of stable_baseline3 because the current
# stable version does not support the latest gym>=0.21 versions. If necessary,
# revert back to stable at the next SB3 release.
%pip install git+https://github.com/DLR-RM/stable-baselines3 2>/dev/null #1>&2
%pip install stable-baselines3Requirement already satisfied: highway-env in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (1.10.1)
Requirement already satisfied: gymnasium>=1.0.0a2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (1.0.0)
Requirement already satisfied: farama-notifications>=0.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (0.0.4)
Requirement already satisfied: numpy>=1.21.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (2.0.2)
Requirement already satisfied: pygame>=2.0.2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (2.6.1)
Requirement already satisfied: matplotlib in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (3.9.2)
Requirement already satisfied: pandas in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (2.2.3)
Requirement already satisfied: scipy in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from highway-env) (1.15.1)
Requirement already satisfied: cloudpickle>=1.2.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium>=1.0.0a2->highway-env) (3.1.1)
Requirement already satisfied: typing-extensions>=4.3.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium>=1.0.0a2->highway-env) (4.12.2)
Requirement already satisfied: contourpy>=1.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (24.1)
Requirement already satisfied: pillow>=8 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->highway-env) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->highway-env) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->highway-env) (2024.2)
Requirement already satisfied: six>=1.5 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib->highway-env) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
Collecting git+https://github.com/DLR-RM/stable-baselines3
Cloning https://github.com/DLR-RM/stable-baselines3 to /private/var/folders/lh/w_mb55fn49v3tqcf3drw5wqc0000gn/T/pip-req-build-x8iraycm
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: stable-baselines3 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (2.5.0)
Requirement already satisfied: gymnasium<1.1.0,>=0.29.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (1.0.0)
Requirement already satisfied: numpy<3.0,>=1.20 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (2.0.2)
Requirement already satisfied: torch<3.0,>=2.3 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (2.6.0)
Requirement already satisfied: cloudpickle in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (3.1.1)
Requirement already satisfied: pandas in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (2.2.3)
Requirement already satisfied: matplotlib in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from stable-baselines3) (3.9.2)
Requirement already satisfied: typing-extensions>=4.3.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium<1.1.0,>=0.29.1->stable-baselines3) (4.12.2)
Requirement already satisfied: farama-notifications>=0.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gymnasium<1.1.0,>=0.29.1->stable-baselines3) (0.0.4)
Requirement already satisfied: filelock in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (3.16.1)
Requirement already satisfied: networkx in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (3.4.2)
Requirement already satisfied: jinja2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (3.1.4)
Requirement already satisfied: fsspec in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (2024.9.0)
Requirement already satisfied: sympy==1.13.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from torch<3.0,>=2.3->stable-baselines3) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from sympy==1.13.1->torch<3.0,>=2.3->stable-baselines3) (1.3.0)
Requirement already satisfied: contourpy>=1.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (24.1)
Requirement already satisfied: pillow>=8 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from matplotlib->stable-baselines3) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->stable-baselines3) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from pandas->stable-baselines3) (2024.2)
Requirement already satisfied: six>=1.5 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib->stable-baselines3) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from jinja2->torch<3.0,>=2.3->stable-baselines3) (3.0.2)
Note: you may need to restart the kernel to use updated packages.
# Environment
import gymnasium as gym # Note: gymnasium is already installled in Colab
import highway_env # noqa: F401
from gymnasium.wrappers import RecordVideo
gym.register_envs(highway_env) # register the environment -- maybe helpful...
# Agent
from stable_baselines3 import DQN # add more if you want to experiment with others
# Visualization utils -- including tensorboard
%load_ext tensorboard
import sys
from tqdm.notebook import trange
%pip install tensorboardx gym pyvirtualdisplay
%pip install tensorboard
# %apt-get install -y xvfb ffmpegThe tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard
Requirement already satisfied: tensorboardx in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (2.6.2.2)
Requirement already satisfied: gym in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (0.26.2)
Requirement already satisfied: pyvirtualdisplay in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (3.0)
Requirement already satisfied: numpy in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboardx) (2.0.2)
Requirement already satisfied: packaging in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboardx) (24.1)
Requirement already satisfied: protobuf>=3.20 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboardx) (5.29.3)
Requirement already satisfied: cloudpickle>=1.2.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gym) (3.1.1)
Requirement already satisfied: gym_notices>=0.0.4 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from gym) (0.0.8)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: tensorboard in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (2.18.0)
Requirement already satisfied: absl-py>=0.4 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (2.1.0)
Requirement already satisfied: grpcio>=1.48.2 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (1.70.0)
Requirement already satisfied: markdown>=2.6.8 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (3.7)
Requirement already satisfied: numpy>=1.12.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (2.0.2)
Requirement already satisfied: packaging in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (24.1)
Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (5.29.3)
Requirement already satisfied: setuptools>=41.0.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (75.3.0)
Requirement already satisfied: six>1.9 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (1.16.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from tensorboard) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.1.1 in /Users/omprakashgunja/miniforge3/envs/ml-mps/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard) (3.0.2)
Note: you may need to restart the kernel to use updated packages.
def run_record(vdir, modelPath, vdofileName, env, forHowLong, freq):# Run the trained model and record video
video_dir = vdir # './videos/highway/DQN'
if not os.path.exists(video_dir):
os.makedirs(video_dir)
# Enable Path Projection
env.unwrapped.config["show_trajectories"] = True # Enable trajectories
env.unwrapped.config["simulation_frequency"] = freq # Higher FPS for rendering
env.unwrapped.config["policy_frequency"] = 5
env.unwrapped.configure(env.unwrapped.config) # Apply the updated configuration
# env = RecordVideo(env, video_folder=video_dir) # wrap env in video recording
# del model # delete trained model to start new for testing
# Assuming your model is saved as 'highway_dqn/model.zip'
# Update this path if it's named differently
saved_model = modelPath #'../saved_models/DQN/dqn_highway_hyper3.zip' #"saved_model_dir #"./highway_dqn/model.zip"
model = DQN.load(saved_model, env=env, print_system_info=False)
#model = DQN.load(saved_model_dir, env=env)
# Remove the RecordVideo wrapper as it's incompatible with highway-env
# Instead, manually save frames and create a video using moviepy
import moviepy as mpy
from PIL import Image
import numpy as np
frames = []
for videos in range(forHowLong): # change 10 (num of episodes) to whatever you like
done = truncated = False
obs, info = env.reset()
while not (done or truncated):
# Predict
action, _states = model.predict(obs)
# Get reward
obs, reward, done, truncated, info = env.step(action)
# print(info)
# Render and append frame to list
#frames.append(Image.fromarray(env.render())) # old code.. not compatible
frames.append(np.array(env.render()))
env.close()
# Save the frames as a video
clip = mpy.ImageSequenceClip(frames, fps=freq)
videofile = video_dir + vdofileName #"/highway_hyper3_video.mp4"
clip.write_videofile(videofile)import glob
import io
from IPython import display as ipythondisplay
from IPython.display import HTML
import base64
def show_video(videofile):
mp4list = glob.glob(videofile)
if len(mp4list) > 0:
mp4 = mp4list[0]
video = io.open(mp4, 'r+b').read()
encoded = base64.b64encode(video)
ipythondisplay.display(HTML(data='''<video alt="test" autoplay
loop controls style="height: 300px;">
<source src="data:video/mp4;base64,{0}" type="video/mp4" />
</video>'''.format(encoded.decode('ascii'))))
else:
print("Could not find video")First set up tensorboard locally to visualize training.
# set the tensorboard log directory
tb_log_dir = "./tensorboard_logs"
if not os.path.exists(tb_log_dir):
os.makedirs(tb_log_dir)
# %load_ext tensorboard
# %tensorboard --logdir tb_log_dir %reload_ext tensorboard# # create an environment
# env = gym.make("highway-fast-v0", render_mode="rgb_array"
# obs, info = env.reset()
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.callbacks import CheckpointCallback
from stable_baselines3.common.evaluation import evaluate_policy
import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn
# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()
policy_kwargs = dict(
net_arch=dict(pi=[256, 256], vf=[256, 256]) # Custom network architecture
)
# Create the PPO model with tuning parameters and custom network architecture
model = PPO(
"MlpPolicy",
'highway-fast-v0',
learning_rate=3e-4, # Base learning rate
n_steps=2048, # Number of steps to run for each environment per update
batch_size=64, # Batch size for each update
n_epochs=10, # Number of epochs when optimizing the surrogate loss
gamma=0.99, # Discount factor
gae_lambda=0.95, # Lambda for Generalized Advantage Estimation
clip_range=0.2, # Clipping parameter for the surrogate loss
ent_coef=0.0, # Coefficient for the entropy loss term
vf_coef=0.7, # Value function loss coefficient
verbose=0,
tensorboard_log=tb_log_dir, # Tensorboard log directory for tuning analysis
policy_kwargs=policy_kwargs # Pass custom policy keyword arguments
)
# Train the model for a reduced number of timesteps for the preliminary experiment
model.learn(total_timesteps=int(2e4))<stable_baselines3.ppo.ppo.PPO at 0x369459030>
# save the model in the specified directory
saved_model_dir = "../saved_models/PPO"
if not os.path.exists(saved_model_dir):
os.makedirs(saved_model_dir)
# Save model with custom name
model.save(f"{saved_model_dir}/ppo_highway_base.zip") %reload_ext tensorboard%load_ext tensorboard
%tensorboard --logdir './tensorboard_logs'The tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard
Reusing TensorBoard on port 6010 (pid 54023), started 0:02:42 ago. (Use '!kill 54023' to kill it.)
# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy
# NOTE: If you use wrappers with your environment that modify rewards,
# this will be reflected here. To evaluate with original rewards,
# wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')20.920220999999998, 0.9433981132056604
import gymnasium as gym
vdoFile = "/ppo_highway_base.mp4"
modelPath = '../saved_models/PPO/ppo_highway_base.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 15, freq=15)
show_video(vdir+vdoFile)MoviePy - Building video ./videos/highway/PPO/ppo_highway_base.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_base.mp4
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_base.mp4
import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn
# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()
policy_kwargs = dict(
activation_fn=nn.ReLU, # Activation function
net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128]) # Custom network architecture
)
# Create the PPO model with tuning parameters and custom network architecture
model = PPO(
"MlpPolicy",
'highway-fast-v0',
learning_rate=3e-4, # Base learning rate
n_steps=4096, # Number of steps to run for each environment per update
batch_size=128, # Batch size for each update
n_epochs=20, # Number of epochs when optimizing the surrogate loss
gamma=0.99, # Discount factor
gae_lambda=0.95, # Lambda for Generalized Advantage Estimation
clip_range=0.2, # Clipping parameter for the surrogate loss
ent_coef=0.0, # Coefficient for the entropy loss term
vf_coef=0.7, # Value function loss coefficient
verbose=0,
tensorboard_log=tb_log_dir+'/hyper1', # Tensorboard log directory for tuning analysis
policy_kwargs=policy_kwargs # Pass custom policy keyword arguments
)
# Train the model for a reduced number of timesteps for the preliminary experiment
model.learn(total_timesteps=int(2e4))<stable_baselines3.ppo.ppo.PPO at 0x302200790>
# save the model in the specified directory
saved_model_dir = "../saved_models/PPO"
if not os.path.exists(saved_model_dir):
os.makedirs(saved_model_dir)
# Save model with custom name
model.save(f"{saved_model_dir}/ppo_highway_hyper1.zip") %reload_ext tensorboard
%load_ext tensorboard
%tensorboard --logdir './tensorboard_logs'The tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard
Reusing TensorBoard on port 6010 (pid 54023), started 0:07:10 ago. (Use '!kill 54023' to kill it.)
# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy
# NOTE: If you use wrappers with your environment that modify rewards,
# this will be reflected here. To evaluate with original rewards,
# wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')20.820220999999997, 0.8717797887081347
import gymnasium as gym
vdoFile = "/ppo_highway_hyper1.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper1.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 15, freq=15)
show_video(vdir+vdoFile)MoviePy - Building video ./videos/highway/PPO/ppo_highway_video.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_video.mp4
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_video.mp4
After tuning the hyperparameters, we observed that while the value function improved significantly (higher explained variance and lower value loss), the overall policy performance declined. The hyperparameter-tuned model exhibited shorter episode lengths and lower rewards compared to the baseline PPO, suggesting that the agent struggled to generalize its decision-making effectively.
This was likely due to over-restrictive policy updates, as evidenced by:
For the next hyperparameter tuning, we will make the following changes:
clip_range to 0.15 → Allows
more flexibility in policy updates and reduces excessive clipping.ent_coef to 0.01 → Encourages
exploration and prevents premature convergence.learning_rate to 2e-4 → Slows
down aggressive policy updates while maintaining learning
stability.With these adjustments, we expect the agent to:
import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn
# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()
# Updated policy architecture and parameters
policy_kwargs = dict(
activation_fn=nn.ReLU, # Activation function
net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128]) # Custom network architecture
)
# Create PPO model with improved hyperparameters
model = PPO(
"MlpPolicy",
'highway-fast-v0',
learning_rate=2e-4, # Reduced learning rate for more stable updates
n_steps=4096, # Number of steps per update
batch_size=128, # Batch size per update
n_epochs=20, # More training iterations per update
gamma=0.99, # Discount factor
gae_lambda=0.95, # Lambda for Generalized Advantage Estimation
clip_range=0.15, # Reduced clipping range for smoother updates
ent_coef=0.01, # Increased entropy coefficient to encourage exploration
vf_coef=0.7, # Value function coefficient
verbose=0,
tensorboard_log=tb_log_dir+'/hyper2', # Tensorboard log directory
policy_kwargs=policy_kwargs # Custom policy settings
)
# Train the model for an extended number of timesteps
model.learn(total_timesteps=int(5e4)) # Increased timesteps for better convergence
# Save the trained model
model.save(f"{saved_model_dir}/ppo_highway_hyper2.zip")# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy
# NOTE: If you use wrappers with your environment that modify rewards,
# this will be reflected here. To evaluate with original rewards,
# wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')19.3868877, 4.25166624042857
%load_ext tensorboard
%tensorboard --logdir './tensorboard_logs'The tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard
Reusing TensorBoard on port 6010 (pid 54023), started 0:32:38 ago. (Use '!kill 54023' to kill it.)
import gymnasium as gym
vdoFile = "/ppo_highway_hyper2.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper2.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)
show_video(vdir+vdoFile)MoviePy - Building video ./videos/highway/PPO/ppo_highway_hyper2.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_hyper2.mp4
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_hyper2.mp4
The updated hyperparameter tuning (“hyper2”) showed improvements in the value function, with:
However, improvements in episode reward and length remain modest compared to the baseline. This suggests that:
clip_range to 0.15 and
increasing ent_coef to 0.01 helped the
agent learn more stable value estimates and smoother policy
updates.For the next tuning iteration ("hyper3"), our objective is to increase lane-switching efficiency, improve speed control, and balance risk-taking. The following changes have been implemented:
ent_coef to 0.02:vf_coef to 0.75:gamma to 0.97:2e-4 for stable
and gradual updates.n_steps=4096 and batch_size=128 support longer
rollouts and gradient stability.n_epochs=20 to allow
ample updates per training cycle.gae_lambda=0.95 for
controlled variance in the advantage estimation.clip_range=0.15 ensures
smooth policy updates.With these changes, we expect the agent to:
import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn
# Create the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
obs, info = env.reset()
policy_kwargs = dict(
activation_fn=nn.ReLU, # Activation function
net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128]) # Custom network architecture
)
# Create PPO model with refined hyperparameters
model = PPO(
"MlpPolicy",
'highway-fast-v0',
learning_rate=2e-4, # Stable learning rate
n_steps=4096, # Longer rollout for long-term decisions
batch_size=128, # Large batch size for gradient stability
n_epochs=20, # More training epochs per update
gamma=0.97, # Reduce long-term planning to favor speed & overtaking
gae_lambda=0.95, # GAE parameter for variance control
clip_range=0.15, # Slightly lower clip range for smooth policy updates
ent_coef=0.02, # More exploration to prevent hesitations
vf_coef=0.75, # Encourage better value function estimation
verbose=0,
tensorboard_log=tb_log_dir+'/hyper3', # New TensorBoard log directory
policy_kwargs=policy_kwargs # Custom policy settings
)
# Train the model for more timesteps to allow better adaptation
model.learn(total_timesteps=int(6e4)) # Increased timesteps for stability
# Save the trained model
model.save(f"{saved_model_dir}/ppo_highway_hyper3.zip")# Evaluate the agent
from stable_baselines3.common.evaluation import evaluate_policy
# NOTE: If you use wrappers with your environment that modify rewards,
# this will be reflected here. To evaluate with original rewards,
# wrap environment in a "Monitor" wrapper before other wrappers.
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')21.120220999999997, 0.8306623862918076
%tensorboard --logdir './tensorboard_logs'Reusing TensorBoard on port 6010 (pid 55268), started 0:00:12 ago. (Use '!kill 55268' to kill it.)
import gymnasium as gym
vdoFile = "/ppo_highway_hyper3.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper3.zip'
vdir = './videos/highway/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("highway-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)
show_video(vdir+vdoFile)MoviePy - Building video ./videos/highway/PPO/ppo_highway_hyper3.mp4.
MoviePy - Writing video ./videos/highway/PPO/ppo_highway_hyper3.mp4
MoviePy - Done !
MoviePy - video ready ./videos/highway/PPO/ppo_highway_hyper3.mp4
After tuning the hyperparameters for the third iteration
(Hyper3), we observed a few notable changes in training
dynamics and performance. Compared to the Baseline PPO
(PPO_1), Hyper1, and
Hyper2, Hyper3 demonstrates improved
episode length and reward stability, but its
real-world performance remains suboptimal, especially
in terms of speed control, lane-switching confidence, and
overtaking behavior.
Hyper3 achieves the longest episode length
(ep_len_mean = 29.07), indicating better
survival.Hyper3 has a low KL divergence
(approx_kl = 0.0053), which means policy
updates are too small.ent_coef=0.02), the
agent does not accelerate aggressively and often lags
behind other vehicles.ep_rew_mean = 20.43), suggesting that
speed and overtaking strategies are not being optimized
effectively.Since Hyper3 does not show clear improvements in
lane-switching and overtaking, the next step is to:
The next steps will involve analyzing the model's behavior in the Merge Environment and designing a custom reward structure to improve real-world driving performance.
import gymnasium as gym
vdoFile = "/ppo_merge_hyper3.mp4"
modelPath = '../saved_models/PPO/ppo_highway_hyper3.zip'
vdir = './videos/merge/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("merge-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashTrue
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
MoviePy - Building video ./videos/merge/PPO/ppo_merge_hyper3.mp4.
MoviePy - Writing video ./videos/merge/PPO/ppo_merge_hyper3.mp4
MoviePy - Done !
MoviePy - video ready ./videos/merge/PPO/ppo_merge_hyper3.mp4
show_video(vdir+vdoFile)import gymnasium as gym
import numpy as np
from gymnasium import RewardWrapper
class CustomHighwayReward(RewardWrapper):
def __init__(self, env):
super().__init__(env)
self.prev_lane = None # Track lane changes
def reward(self, reward):
"""Modify reward based on speed, overtaking, and lane-switching efficiency."""
# Get the ego vehicle's state
ego_vehicle = self.unwrapped.vehicle
speed = ego_vehicle.speed # Get vehicle speed
lane_index = ego_vehicle.lane_index[2] # Extract lane position
# Reward for maintaining high speed
speed_bonus = speed / 30 # Normalize speed to max ~30
# Encourage overtaking: if the agent is moving faster than others nearby
num_overtakes = sum(1 for v in self.unwrapped.road.vehicles if v.position[0] < ego_vehicle.position[0] and v.speed < ego_vehicle.speed)
overtake_bonus = num_overtakes * 0.5 # Reward per successful overtake
# Penalty for unnecessary lane-switching
lane_switch_penalty = -0.2 if self.prev_lane is not None and self.prev_lane != lane_index else 0
# Update previous lane for next step comparison
self.prev_lane = lane_index
# Compute final reward
new_reward = reward + speed_bonus + overtake_bonus + lane_switch_penalty
return new_reward
# Wrap the environment
env = gym.make("highway-fast-v0", render_mode="rgb_array")
env = CustomHighwayReward(env)
obs, info = env.reset()# Define the policy network structure (same as before to maintain learning consistency)
policy_kwargs = dict(
activation_fn=nn.ReLU,
net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128])
)
# Load the pretrained model from `hyper3`
model = PPO.load(f"{saved_model_dir}/ppo_highway_hyper3.zip", env=env, custom_objects={"policy_kwargs": policy_kwargs})
# Continue training in the new environment
model.learn(
total_timesteps=int(4e4), # Fine-tune with 40,000 steps
reset_num_timesteps=False # Do NOT reset timesteps (continue from previous training)
)
# Save the fine-tuned model
model.save(f"{saved_model_dir}/ppo_merge_hyper4.zip")mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print (f'{mean_reward}, {std_reward}')43.6517909, 4.665951590621669
vdoFile = "/ppo_merge_hyper4.mp4"
modelPath = '../saved_models/PPO/ppo_merge_hyper4.zip'
vdir = './videos/merge/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("merge-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashTrue
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashTrue
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overFalse
crashFalse
overTrue
MoviePy - Building video ./videos/merge/PPO/ppo_merge_hyper4.mp4.
MoviePy - Writing video ./videos/merge/PPO/ppo_merge_hyper4.mp4
MoviePy - Done !
MoviePy - video ready ./videos/merge/PPO/ppo_merge_hyper4.mp4
show_video(vdir+vdoFile)After fine-tuning the previously trained model in the highway-merge-v0 environment, the agent's performance has improved in several key areas:
Despite these improvements, further testing is needed to assess the model’s performance in more complex environments.
The next phase of evaluation will involve testing the fine-tuned PPO model in the highway-roundabout-v0 environment. This test will focus on:
The upcoming test will determine if additional tuning or reward modifications are necessary to handle roundabout navigation challenges effectively.
vdoFile = "/ppo_rounabout_hyper4.mp4"
modelPath = '../saved_models/PPO/ppo_merge_hyper4.zip'
vdir = './videos/roundabout/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("roundabout-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)MoviePy - Building video ./videos/roundabout/PPO/ppo_rounabout_hyper4.mp4.
MoviePy - Writing video ./videos/roundabout/PPO/ppo_rounabout_hyper4.mp4
MoviePy - Done !
MoviePy - video ready ./videos/roundabout/PPO/ppo_rounabout_hyper4.mp4
show_video(vdir+vdoFile)After testing the trained PPO model in the highway-roundabout-v0 environment, several key issues were identified:
The observed behavior suggests that the agent over-prioritizes safety, leading to an overly cautious and inactive policy. This prevents it from effectively participating in traffic flow.
To address these issues, the next iteration (hyper5)
will include:
ent_coef=0.03) to encourage more
exploration.gamma=0.95) to
prioritize short-term gains like merging effectively.hyper5) in the roundabout environment.import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn
# Custom Reward Wrapper for Roundabout Environment
class CustomRoundaboutReward(RewardWrapper):
def __init__(self, env):
super().__init__(env)
self.prev_lane = None
def reward(self, reward):
"""Modify reward for roundabout navigation."""
ego_vehicle = self.unwrapped.vehicle
speed = ego_vehicle.speed
lane_index = ego_vehicle.lane_index[2]
# Encourage entering the roundabout (reward for leaving the entrance lane)
enter_bonus = 1.0 if lane_index != 0 else 0
# Encourage maintaining a reasonable speed
speed_bonus = (speed / 30) * 0.5 # Reduced impact to balance safety
# Slight penalty for standing still too long
idle_penalty = -0.3 if speed < 1.0 else 0
# Reduce lane-switch hesitation penalty (allow more flexibility)
lane_switch_penalty = -0.1 if self.prev_lane is not None and self.prev_lane != lane_index else 0
# Update lane memory
self.prev_lane = lane_index
# Compute final reward
new_reward = reward + enter_bonus + speed_bonus + idle_penalty + lane_switch_penalty
return new_reward
# Create and wrap the environment
env = gym.make("roundabout-v0", render_mode="rgb_array")
env = CustomRoundaboutReward(env)# Load previously trained model (`hyper4`)
previous_model = PPO.load(f"{saved_model_dir}/ppo_merge_hyper4.zip", env=env)
# Extract policy and policy weights from the trained model
policy_kwargs = dict(
activation_fn=nn.ReLU,
net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128])
)
# Reinitialize PPO with updated hyperparameters **but keep trained policy, critic, and optimizer**
model = PPO(
"MlpPolicy",
env,
learning_rate=5e-4, # Reduce learning rate for more stable updates
gamma=0.8, # Slightly higher gamma to prevent extreme short-term thinking
tensorboard_log=tb_log_dir+'/PPO_roundabout_hyper5', # New log directory
verbose=0,
ent_coef=0.03, # Slightly higher entropy coefficient for more exploration
policy_kwargs=policy_kwargs, # Maintain previous architecture
)
# **Load policy & critic weights from previous training**
model.policy.load_state_dict(previous_model.policy.state_dict())
model.policy.value_net.load_state_dict(previous_model.policy.value_net.state_dict())
# **Load optimizer state** (this ensures learning doesn't restart from scratch)
model.policy.optimizer.load_state_dict(previous_model.policy.optimizer.state_dict())
# Continue training with the new hyperparameters
model.learn(
total_timesteps=int(4e4), # Fine-tune for another 40,000 steps
reset_num_timesteps=False # Continue from previous learning
)
# Save the fine-tuned model
model.save(f"{saved_model_dir}/ppo_roundabout_hyper5.zip")%tensorboard --logdir './tensorboard_logs'Reusing TensorBoard on port 6010 (pid 56985), started 0:00:01 ago. (Use '!kill 56985' to kill it.)
vdoFile = "/ppo_rounabout_hyper5.mp4"
modelPath = '../saved_models/PPO/ppo_roundabout_hyper5.zip'
vdir = './videos/roundabout/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("roundabout-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)MoviePy - Building video ./videos/roundabout/PPO/ppo_rounabout_hyper5.mp4.
MoviePy - Writing video ./videos/roundabout/PPO/ppo_rounabout_hyper5.mp4
MoviePy - Done !
MoviePy - video ready ./videos/roundabout/PPO/ppo_rounabout_hyper5.mp4
show_video(vdir+vdoFile)To fully eliminate hesitation and improve merging confidence, the following changes were made:
ent_coef=0.015 Encourages
faster and more decisive actions.gamma=0.98 Helps the agent
plan ahead instead of overreacting to immediate
events.clip_range=0.1 Ensures
gradual and stable policy updates.learning_rate=6e-4 Allows
quicker adaptation to improved reward shaping.Hyper6 for 50,000 steps.This is the final fine-tuning step before testing real-world-like performance.
import gymnasium as gym
from stable_baselines3 import PPO
import torch.nn as nn
from gymnasium import RewardWrapper
# Custom Reward Wrapper for Roundabout Environment
class CustomRoundaboutReward(RewardWrapper):
def __init__(self, env):
super().__init__(env)
self.prev_lane = None
self.entry_timer = 0 # Track how long the agent hesitates at entry
def reward(self, reward):
"""Modify reward for roundabout navigation."""
ego_vehicle = self.unwrapped.vehicle
speed = ego_vehicle.speed
lane_index = ego_vehicle.lane_index[2]
# Encourage entering the roundabout (reward for leaving the entrance lane)
enter_bonus = 1.5 if lane_index != 0 else -0.2 # Penalty for staying at entry too long
# Encourage maintaining a reasonable speed
speed_bonus = (speed / 30) * 0.5 # Reward for steady speeds
# Stronger penalty for idling too long before entering
if lane_index == 0: # Still at entry?
self.entry_timer += 1
idle_penalty = -0.5 if self.entry_timer > 30 else 0 # Hesitation penalty
else:
self.entry_timer = 0 # Reset timer if agent enters
idle_penalty = 0 # Ensure idle_penalty is defined
# Reduce lane-switch hesitation penalty
lane_switch_penalty = -0.1 if self.prev_lane is not None and self.prev_lane != lane_index else 0
self.prev_lane = lane_index # Update lane memory
# Compute final reward
new_reward = reward + enter_bonus + speed_bonus + idle_penalty + lane_switch_penalty
return new_reward
# Create and wrap the environment
env = gym.make("roundabout-v0", render_mode="rgb_array")
env = CustomRoundaboutReward(env)
# Load previously trained model (`Hyper5`)
previous_model = PPO.load(f"{saved_model_dir}/ppo_roundabout_hyper5.zip", env=env)
# Maintain architecture but update hyperparameters
policy_kwargs = dict(
activation_fn=nn.ReLU,
net_arch=dict(pi=[256, 256, 128], vf=[256, 256, 128])
)
# Final Hyper6 Model Adjustments
model = PPO(
"MlpPolicy",
env,
learning_rate=6e-4, # Faster adaptation
gamma=0.98, # Planning further ahead
ent_coef=0.015, # Reduce hesitation, commit to decisions faster
clip_range=0.1, # More stable updates
tensorboard_log=tb_log_dir+'/PPO_roundabout_hyper6',
verbose=0,
policy_kwargs=policy_kwargs,
)
# Load previous policy & value function weights
model.policy.load_state_dict(previous_model.policy.state_dict())
model.policy.value_net.load_state_dict(previous_model.policy.value_net.state_dict())
model.policy.optimizer.load_state_dict(previous_model.policy.optimizer.state_dict())
# Continue training with updated parameters
model.learn(
total_timesteps=int(5e4), # Training for 50,000 steps
reset_num_timesteps=False
)
# Save the fine-tuned model
model.save(f"{saved_model_dir}/ppo_roundabout_hyper6.zip")%tensorboard --logdir './tensorboard_logs'Reusing TensorBoard on port 6010 (pid 56985), started 0:37:52 ago. (Use '!kill 56985' to kill it.)
vdoFile = "/ppo_rounabout_hyper6.mp4"
modelPath = '../saved_models/PPO/ppo_roundabout_hyper6.zip'
vdir = './videos/roundabout/PPO'
# Create a new environment for testing
env.reset()
env = gym.make("roundabout-v0", render_mode="rgb_array")
run_record(vdir=vdir,modelPath=modelPath,vdofileName=vdoFile, env=env, forHowLong= 10, freq=20)MoviePy - Building video ./videos/roundabout/PPO/ppo_rounabout_hyper6.mp4.
MoviePy - Writing video ./videos/roundabout/PPO/ppo_rounabout_hyper6.mp4
MoviePy - Done !
MoviePy - video ready ./videos/roundabout/PPO/ppo_rounabout_hyper6.mp4